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Abstract. We address the problem of studying the toric ideals of phylo- 
genetic invariants for a general group-based model on an arbitrary claw 
tree. We focus on the group Z2 and choose a natural recursive approach 
that extends to other groups. The study of the lattice associated with 
each phylogenetic ideal produces a list of circuits that generate the corre- 
sponding lattice basis ideal. In addition, we describe explicitly a quadratic 
lexicographic Grobner basis of the toric ideal of invariants for the claw 
tree on an arbitrary number of leaves. Combined with a result of Sturm- 
fels and Sullivant, this implies that the phylogenetic ideal of every tree 
for the group Z2 has a quadratic Grobner basis. Hence, the coordinate 
ring of the toric variety is a Koszul algebra. 
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1 Introduction 

Phylogenetics is concerned with determining genetic relationship between species 
based on their DNA sequences. First, the various DNA sequences are aligned, 
that is, a correspondence is established that accounts for their differences. As- 
suming that all DNA sites evolve identically and independently, the focus is 
on one site at a time. The data then consists of observed pattern frequencies 
in aligned sequences. This observed data are used to estimate the true joint 
probabilities of the observations and, most importantly, to reconstruct the an- 
cestral relationship among the species. The relationship can be represented by 
a phylogenetic tree. 

A phylogenetic tree T is a simple, connected, acyclic graph equipped with 
some statistical information. Namely, each node of T is a random variable with 
k possible states chosen from the state space S. Edges of T are labeled by 
transition probability matrices that reflect probabilities of changes of the states 
from a node to its child. These probabilities of mutation are the parameters for 
the statistical model of evolution, which is described in terms of a discrete-state 
continuous-time Markov process on the tree. Since the goal is to reconstruct 
the tree, the interior nodes are hidden. The relationship between the random 
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variables is encoded by the structure of the tree. At each of the n leaves, we can 
observe any of the k states; thus there are k n possible observations. Let p a be 
the joint probability of making a particular observation a C S n at the leaves. 
Then p a is a polynomial in the model parameters. 

A phylogenetic invariant of the model is a polynomial in the leaf probabilities 
which vanishes for every choice of model parameters. The set of these polynomials 
forms a prime ideal in the polynomial ring over the unknowns p a . The objective 
is to compute this ideal explicitly. Thus we consider a polynomial map <f> : C N — > 
C fc , where N is the total number of model parameters. The map depends only 
on the tree T and the number of states fc; its coordinate functions are the k n 
polynomials p a . The map <f> induces a parametrization of an algebraic variety. 
The study of these algebraic varieties for various statistical models is a central 
theme in the field of algebraic statistics ([H]). Phylogenetic invariants are a 
powerful tool for tree reconstruction ([2], [3], [7]). 

There is a specific class of models for which the ideal of invariants is par- 
ticularly nice. Let M e be the k x k transition probability matrix for edge e of 
T. In the general Markov model, each matrix entry is an independent model 
parameter. A group-based model is one in which the matrices M e are pairwise 
distinct, but it is required that certain entries coincide. For these models, tran- 
sition matrices are diagonalizable by the Fourier transform of an abelian group. 
The key idea behind this linear change of coordinates is to label the states (for 
example, A,C,G, and T) by a finite abelian group (for example, Z2 x Z2) in such 
a way that transition from one state to another depends only on the difference of 
the group elements. Examples of group-based models include the Jukes-Cantor 
and Kimura's one-parameter models used in computational biology. 

Sturmfels and Sullivant in reduce the computation of ideals of phyloge- 
netic invariants of group-based models on an arbitrary tree to the case of claw 
trees T n :— K\, n , the complete bipartite graph from one node (the root) to n 
nodes (the leaves) . The main result of [TT] gives a way of constructing the ideal 
of phylogenetic invariants for any tree if the ideal for the claw tree is known. 
However, in general, it is an open problem to compute the phylogenetic invari- 
ants for a claw tree. We consider the ideal for a general group-based model for 
the group Z2. Let q a be the image of p a under the Fourier transform. Assuming 
the identity labeling function and adopting the notation of [H] , the ideal of phy- 
logenetic invariants for the tree T n is the kernel of the following homomorphism 
between polynomial rings: 



where G is a finite group with k elements, each corresponding to a state. The 
coordinate q gi ,..., gn corresponds to observing the element g\ at the first leaf of T, 
(72 at the second, and so on. The phylogenetic invariants form a toric ideal in the 
Fourier coordinates q a , which can be computed from the corresponding lattice 
basis ideal by saturation. The main result of this paper is a complete description 
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of the lattice basis ideal and a quadratic Grobner basis of the ideal of invariants 
for the group Z2 on T n for any number of leaves n. 

Our paper is organized as follows. In section 2 we lay the foundation for our 
recursive approach. The ideal of the two-leaf claw tree is trivial, so we begin 
with the case when the number of leaves is three. Sections 3 and 4 address the 
problem of describing the lattices corresponding to the toric ideals. We provide 
a nice lattice basis consisting of circuits. The corresponding lattice basis ideal is 
generated by circuits of degree two and thus in particular satisfies the Sturmfels- 
Sullivant conjecture. 

The ideal of phylogenetic invariants is the saturation of the lattice basis ideal. 
However, we do not use any of the standard algorithms to compute saturation 
(e.g. [8], [10]). Instead, our recursive construction of the lattice basis ideals can 
be extended to give the full ideal of invariants, which we describe in the final 
section. The recursive description of these ideals depends only on the number of 
leaves of the claw tree and it does not require saturation. Finally, and possibly 
somewhat surprisingly, we show that the ideal of invariants for every claw tree 
admits a quadratic Grobner basis with respect to a lexicographic term order. 
We describe it explicitly. 

Combined with the main result of Sturmfels and Sullivant in , this implies 
that the phylogenetic ideal of every tree for the group Z2 has a quadratic Grobner 
basis. Hence, the coordinate ring of the toric variety is a Koszul algebra. In 
addition, the ideals for every tree can be computed explicitly. These ideals are 
particularly nice as they satisfy the conjecture in [llj which proposes that the 
order of the group gives an upper bound for the degrees of minimal generators of 
the ideal of invariants. The case of 1o has been solved in 11 using a technique 
that does not generalize. We hope to extend our recursive approach and obtain 
the result for an arbitrary abelian group. 

For a detailed background on phylogenetic trees, invariants, group-based 
models, Fourier coordinates, labeling functions and more, the reader should refer 
to [TJ, i, 0, [UJ. 

2 Matrix representation 

Fix a claw tree T n on n leaves and a finite abelian group G of order k. Soon we 
will specialize to the case k = 2. We want to compute the ideal of phylogenetic 
invariants for the general group-based model on T n . After the Fourier transform, 
the ideal of invariants (in Fourier coordinates) is given by /„ = ker ip n , where 
ip n is a map between polynomial rings in k n and fc(n + 1) variables, respectively, 
defined by (Q. In order to compute the toric ideal /„, we first compute the 
lattice basis ideal Il u C /„ corresponding to ip n as follows. Fixing an order on 
the monomials of the two polynomial rings, the linear map tp can be represented 
by a matrix B n j, that describes the action of <p on the variables. Then the lattice 
L n — ker(B nj fc) c Z fe determines the ideal Ji n . It is generated by elements of 
the form ([\Qg 1 ,,,,,g n ) v — (II Qgi,—,g n ) v where v = v + — v~ 6 L n . We will give 
an explicit description of this basis and, equivalently, the ideals Il„ ■ 
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Hereafter assume that G = Z 2 . For simplicity, let us say that B n := B n , 2 - 
To create the matrix B n , first order the two bases as follows. Order the a$ 



by varying the upper index (i) first and then the group clement g: a a 



(1) „(2) 



>+l) 



Then, order the q gi ,...,g n by ordering the indices with 



a, 

respect to binary counting: 

90...00 > 9o. ..01 > • • • > 5i. ..io > Qi...i- 
That is, <7 3l ... 3 „ > qh x ...h n if and only if (gi . . . g n ) 2 < (hi . . . h n ) 2 , where 

( 5 i . . . g n ) 2 ~ 512"" 1 + . 92 2"- 2 + • • • + ,9„2° 
represents the binary number g\ . . . g n . 

Next, index the rows of B n by a g ^ and its columns by q gi ,..., 9n - Finally, put 

(i) 

1 in the entry of B n in the row indexed by a g and column indexed by q gi ,..., 9n 
if a g ^ divides the image of q gi ,..., g „, and otherwise. 

Example 1. Let n = 2. Then we order the q^ variables according to binary 
counting: goo, 9oi, 9io, 9n, so that 



p : 



'oo, 9oi, 9io, 9n] — > C^o^, a 2) , a 3) . 

(l) (2) (3) 

900 >-> a a o a o+o 

(1) (2) (3) 

901 ' * a 'a} 'a 0+ \ 
(i) (2) (3) 



2 



,4 3) ] 



9n i— > a\ a 



1+1 • 



Thus 



B 2 



Now we put the a-"^ variables in order: a^, a Q 2 \ a 3 \ aj 1 ', a^, 

110 0' 
10 10 
10 1 
11 
10 1 
110 

The tree T„_i can be considered as a subtree of T n by ignoring, for example, 
the leftmost leaf of T. As a consequence, a natural question arises: how does B n 
relate to B„_i? 

Remark 1. The matrix B„_i for the subtree of T n with the leaf (1) removed 
can be obtained as a submatrix of B n for the tree T n by deleting rows 1 and 
(n + 1) + 1 and taking only the first 2™" 1 columnns. 

Divide the n-leaf matrix B n into a 2 x 2 block matrix with blocks of size (n + 
1) x 2"" 1 : 

Bn B12 
B21 B 2 2 



B n 
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Then, grouping together Bn, B21 without the first row of each Bn, we obtain the 
matrix B n _x. This is true because rows 1 and (n + 1) + 1 represent the variables 
Og for g G G associated with the leaf (1) of T n . Note that the entries in row 
a,g remain undisturbed as the omitted rows are indexed by the identity of 
the group. 

Example 2. The matrix B 2 is equal to the submatrix of B 3 formed by rows 
2,3,4,6,7,8, and first 4 columns. 

Remark 2. Fix any observation a = g±,...,g n on the leaves. Clearly, at any 
given leaf j £ {1, . . . , n}, we observe exactly one group element, gj. Since the 
matrix entry b o> in the row indexed by a„/ and column indexed by q a is 1 

a gj j<7ct 
(1) 

exactly when a g / divides the image of q a , one has that 

Via) =1 

9jtG 



for a fixed leaf (j) and fixed observation a. Note that the formula also holds if 

Jn+l) 
l gi + -+9n 



j = n + 1 by definition of CLg^P = a„ n 1 ^}. +gn . In particular, the rows indexed by 



(i) 

a g / for a fixed j sum up to the row of ones. 
3 Number of lattice basis elements 

We compute the dimension of the kernel of B n by induction on n. We proceed 
in two steps. 

Lemma 1 (Lower bound). 

rank(_B„) > rank(_B„_i) + 1. 

Proof. First note that rank(B n ) > rank(£?„_i) since i?„_i is a submatrix of 

the first 2" _1 columns of B n . In the block [Bn, B12] , the row indexed by 

is zero, while in the block [£21,-822] T , the row indexed by is 1. Choosing 

one column from [-621,-622] T provides a vector independent of the first 2 ,l_1 
columns. The rank must therefore increase by at least 1. □ 

Lemma 2 (Upper bound). 

rank(B„) < n + 2. 

Proof. B n has 2(n + 1) rows. Remark [2] provides n independent relations among 
the rows of our matrix: varying j from 1 to n + 1, we obtain that the sum of 
the rows j and n + 1 + j is 1 for each j = 1, . . . , n + 1. Thus the upper bound is 
immediate. □ 



We arc ready for the main result of the section. 
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Proposition 1 (Cardinality of lattice basis). 

Let n > 2. Then there are 2 n — 2(n + l)+n elements in the basis of the lattice 
L n corresponding to T n . That is, 

dimkcr(B„) = 2" - 2(n + 1) + n. 

Proof. We show rank(£? n ) = 2(n+ 1) — n. It can be checked directly that E>2 has 
full rank. Assume that the claim is true for n — 1. Then by Lemmae ([T]) and ([2|), 

2(n + 1) — n > rank(B„) > rank(B n _i) + 1 = 2n - (n - 1) + 1, 

where the last equality is provided by the induction hypothesis. The claim follows 
since the left- and the right-hand sides agree. □ 

4 Lattice basis 

In this section we describe a basis of the kernel of B n := B n 2, in which 
the binomials corresponding to the basis elements satisfy the conjecture on the 
degrees of the generators of the phylogenctic ideal. In particular, since the ideal is 
generated by squarefree binomials and contains no linear forms, these elements 
are actually circuits. By Proposition [TJ we need to find 2™ — (n + 2) linearly 
independent vectors in the lattice. The matrix of the tree with n = 2 leaves has 
a trivial kernel, so we begin with the tree on n = 3 leaves. The dimension of the 
kernel is 3 and the lattice basis is given by the rows of the following matrix: 

"OQ 1 -1 -1 1 0" 

10-1-1010 . 

1 -1 -1 1 

In order to study the kernels of B n for any n, it is useful to have an algorithmic 
way of constructing the matrices. 

Algorithm 1 [The construction of B n ] 
Input: the number of leaves n of the claw tree T n . 
Output: B n G Z 2 < n+1 ) x2 ". 
Initialize B n to the zero matrix. 
Construct the first n rows: 
for k from 1 to n do: 
for c from to 2 fc — 1 with c = mod 2 do: 
for j from c2"- fc + 1 to (c + l)2"- fe do: b k ,j ■= 1. 
Construct row n + 1 : 

if n = (J2r=i br,j) mod 2, then 6 n +ij := 1. 
Construct rows n + 2 to 2(n + 1): 
for i from 1 to n + 1 do: 
for j from 1 to 2™ do: b n+ i +i ,j := 1 — bi t j. 
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One checks that this algorithm gives indeed the matrices B n as defined in 
Section 3. 

The (n + 1 + i) th row r n +x+i of B n is by definition the binary complement of 
the i th row Vi of Suppose that Vi ■ k = for some vector fc. Since all entries of 
B n are nonnegative, a subvector of k restricted to the entries where ri is nonzero 
must be homogeneous in the sense that the sum of the positive entries equals the 
sum of the negative entries. But since the ideal Il„ itself is homogeneous ([10]). 
the same must be true for the subvector of k restricted to the entries where 
is zero. Hence r n +i+i • k = 0. Therefore, it is enough to analyze the top half of 
the matrix B n when determining the kernel elements. 

Remark 3. There are n copies of -B n _i inside B n . 

By deleting one leaf at a time, we get n copies of T n -i as a subtree of T n . Suppose 
we delete leaf (i) from T n to get the tree T„ ' on leaves 1, 2, . . . , i — 1, i + 1, . . . , n. 
Ignoring the two rows of B n that represent the leaf (i) and taking into account 
the columns of B n containing nonzero entries of the row indexed by (that 
is, observing at leaf (i)) gives precisely the matrix B n -\ corresponding to Tn\ 
Note that the entry indexed by dg™ , for any g G G, will be correct since we 
are ignoring the identity of the group, as in Remark [TJ 

This leads to a way of constructing a basis of ker(i?„) from the one of 
ker(B„_i). Namely, removing leaf (1) from T n produces dim(ker(_B„_i)) = 
2«-i _ n _ y independent vectors in ker(£? n ). Let us name this collection of vec- 
tors V\. Removing leaf (2) produces a collection Vi consisting of dim(ker£>„_i) — 
dim(ker _B„_2) = 2™~ 2 — 1 vectors in ker(i?„). V% is independent of V% since the 
second half of each vector in V% has nonzero entries in the columns of B n where 
all vectors in V\ are zero, a direct consequence of the location of the submatrix 
corresponding to T„ . Finally, removing any other leaf (i) of T n produces a col- 
lection Vi of as many new kernel elements as there are new columns involved (in 
terms of the submatrix structure); namely, 2 n ~ l new vectors. Note that every 
vector in V2 has a nonzero entry in at least one new column so that the full 
collection is independent of V\ . 

Using the above procedure, we have obtained 

(2"- 1 - n - 1) + (2™~ 2 - 1) + (2™~ 3 ) + • • • + 2™-" 

independent vectors in the kernel of B n . This is exactly one less than the desired 
number, 2" — n — 2. Hence to the list of the kernel generators we add one 
additional vector v that is independent of all the Vi, i — 1, . . . , n as it has a 
nonnegative entry in the last column. (Note that no v G Vi has this property by 
the observation on the column location of the submatrix associated with each 
T^ ] .) In particular, v = [0, . . . , 0, 1, 0, 0, -1, -1, 0, 0, 1] G ker(B„). To see this, 
we simply notice that the rows of the last 8-column block of B n are precisely the 
rows of the first 8-column block of B n up to permutation of rows, which does 
not affect the kernel. 

The lattice basis we just constructed is directly computed by the following 
algorithm. 
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Algorithm 2 [Construction of the lattice basis for T n ] 

Input: the number of leaves n of the claw tree T n . 

Output: a basis of ker B n in form of a (2™ — n — 2) x 2" matrix L n . 

"00 1 -1 -1 100" 
Let L 3 := 10-1-1010. 

1 -1 -1 00 1 

Set k := 4. 

The following subroutine lifts L k -i to L k : 

WHILE k < n do:{ 

Initialize L k to the zero matrix. 

For i from 1 to k do: 

cols(i) := {1..2 k -\ (2)2 k - 1 + l..(3)2 fe -*, . . . , (2* - 2)2 fe ~ 4 + l..(2 i - l)2 fc -*}. 
Denote by L k j [cols(i)] the j*' 1 row vector of L k restricted to columns cols(i). 
Set i := 1: 

for j from 1 to 2 fc_1 — k — 1 do: ifcj [cols(i)] := Lk-ij. 
Set i := 2: 

for j from 1 to 2 fc ~ 2 — 1 do : 

-^fc,(2'=-i-fe-i)+j [cols(i)] := L k _ 1{2k - 
For i from 3 to fc do: 

for j from 1 to 2 k ~ l do: 



-fc-l)-(2*-2-l)+j- 



J fe,(2 fc -2 fc + 1 - 



Finally, L k ^ k _ k _ 2 
RETURN L k . } 



fe _2)+j[cols(i)] 
[2 k - 7..2 fc ] := 



fc-l,(2''- 1 -fc-l)-(2 fc - i )+j- 

[1,0,0,-1,-1,0,0,1]. 



Example 3. Consider the tree on n = 4 leaves. Then 
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The lattice basis is given by the rows of the following matrix: 



Phylogenetic ideals for claw trees 9 



u = 



1 


-1 


-1 


1 








00 











1 


-1 


-1 





1 





00 











1 


-1 


-1 








1 


00 











1 


-1 














-1 1 











1 


-1 














-1 1 











1 


-1 














-10 


1 








1 








-1 








-10 





1 





1 








-1 








-10 








1 


1 











-1 


-10 








1 


000 

















1 00 


-1 


-100 1 



The lattice vectors correspond to the relations on the leaf observations in the nat- 
ural way; namely, the first column corresponds to 9o,...,Oj the second to 9o,...,o,ij 
and so on. Therefore, the lattice basis ideal for T4 in Fourier coordinates is 

Il 4 = (9001090101 — 9oon9oioo, 9oooi9ono — 9001190100, 9oooo9om — 9oon9oioo, 
9ooio9iooi — 9oon9iooo, 9oooi9ioio — 9oon9iooOj 9oooo9ion — 9oon9iooO: 
9oooi9noo — 9oioi9iooO) 9oooo9noi — 9oioi9iooo, 
9oooo9mo — 9ono9iooo, 9iooo9im — 9ioii9noo)- 

This ideal is contained in the ideal of phylogenetic invariants I4 for T4. In the 
next section, we compute explicitly the generators of the ideal of invariants for 
any claw three T n and the group Z2. 

5 Ideal of invariants 

We show that the lattice basis ideals provide basic building blocks for the full 
ideals of invariants, as expected. However, instead of computing the ideal of in- 
variants as a saturation of the lattice basis ideal in a standard way (e.g. [8], [10]). 
we use the recursive constructions from the previous section on the saturated 
ideals directly. We begin with the ideal of invariants for the smallest tree, and 
build all other trees recursively. The underlying ideas for how to lift the gener- 
ating sets come from Algorithm [2] 

We will denote the ideal of the claw tree on n leaves by /„ = ker ip n . As we have 
seen, the first nontrivial ideal is I3. 

5.1 The tree on n — 3 leaves 

Claim. The ideal of the claw tree on n — 3 leaves is 

^3 = (9ooo9ni ~ 9ioo9on, 9ooi9no — 9ioo9on, 9oio9ioi — 9ioo9on)- 
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This can be verified by computation. In particular, this ideal is equal to the 
lattice basis ideal for the tree on three leaves; Il 3 is already prime in this case. 
Let <:~<i ex be the lexicographic order on the variables induced by 

9ooo > 9ooi > 9oio > 9on > 9ioo > 9ioi > 9no > 9m- 

(That is, qijk > qi'j'k' if and only if (ijfc) 2 < {i'j'k')2, where (ijk)2 denotes the 
binary number ijk.) 

Remark 4- The three generators of 1% above are a Grobner basis for ^3 with 
respect to <, since the initial terms, written with coefficient +1 in the above 
description, are relatively prime so all the S-paris reduce to zero. 

Remark 5. Write the quadratic binomial q = q + — q~ as 

Then q G 1$ if and only if the following two conditions hold: 
1. Exchanging the roles of q h (i) h ( 2 ) h (3) and Q h w h ( 2 ) h (n) if necessary, 

and 

2- + gf = 1 = hf + hf for 1 < i < 3 = n. 

Note that the second condition holds since otherwise the projection of q 
obtained by eliminating the leaf (i) at which the observations and g^ are 
both equal to or to 1 produces an element q' in the kernel of the map ipi of 
the 2-leaf tree, which is trivial. 

5.2 The tree on an arbitrary number of leaves 

Let us now define a set of maps and a distinguished set of binomials in I n . 

Definition 1. Let 71^(9) be the projection of q that eliminates the i th index of 
each variable in q. 

For example, 

""4(9000091110 - 9iooo9ono) = 9ooo9m _ 9ioo9on- 

Definition 2. Assume that n > 4. 

Let Q n be the set of quadratic binomials q G I n that can be written as 

q = q + -q~= ? 9 m.,w9 9 «.,w - V ) ■■4" )<z '4 1) ...'4" , 

such that one of the two following properties is satisfied: 
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Property (i): For some 1 < i < n, j 6 Z 2 , 




(1) 



and 



(2) 



Property (ii): For each 1 < k < n 



0) , (fe) i r,(fc) , i,(fc) 
9l + 92 = 1 = K + h 2 



(3) 



and 



TTfc(g) G /n-l- 



(4) 



Example 4- Let n = 4. The set of elements q <E G n with Property (i) consists of 
those for which j = 0: 

°0000°0111 — °0100°001l7 °0001°0110 — °0100<Z0011j < ?0010?0101 — °0100<?0011: 

9oooo°ion — <Ziooo9ooii, °oooi°ioio — 9100090011, ^ooio^iooi — °iooo<Zooii: 
9oooo°noi — <Ziooo9oioi, 9oooi°noo — <Ziooo<Zoioi, <Zoioo<Ziooi — °iooo<Zoioi: 
oooooomo — <Ziooogono, oooioonoo — qiooo<7ono, <?oioo9ioio — 9100090110; 
and those for which j = 1 : 

oioooomi — <Ziioooion, oiooiomo — <Ziioo<Zion, <Zioio<Znoi — <7iioo<Zioii, 
9oioo°iin — <Ziioo°om, <Zoioi9ino — <7iioo<7om, <7oiio<7noi — °iioo<7om, 
9ooio fl iin — <Zioio°om, <7ooii<7mo — 9ioio<7om, <ZoiioQion — <?ioio9om, 
9oooi°iin — 91001O0111, Qooii°noi — 9iooi<Zom, <Zoioi9ion — giooi<Zom- 
The set of elements q £ Q n with Property (ii) are: 

ooooogim — <Ziooigoiio, <Zoooi9ino — <Ziooo<Zom, <Zooii<7noo — °iooi<7oiio, 
°ooio°noi — <Ziooo°om, °oioi°ioio — <Ziooi<7ono, <7oioo°ion — °iooo9oin- 

Proposition 2. For n > 4, the set of binomials in Q n generates the ideal I n . 
That is, 



In addition, this set of generators can be obtained inductively by lifting the gen- 
erators corresponding to the various phylogenetic ideals on n — 1 leaves. 

Proof. Condition ([3]) is simply the negation of ([T]) . Condition fl} can be restated 
as follows: for some 1 < i < n and a fixed j, 



Therefore, Property (i) translates to having an observation j fixed at leaf (i) for 
each of the variables in q. On the other hand, condition © means that for any k, 
not all the k th indices are and not all are 1. Thus Property (ii) means that no 
leaf has a fixed observation, and can be restated as follows: for every 1 < i < n, 



I„ = (q : q + - q £ Q n ). 



[af ) 2 \<p n (q + ) and (a? )V(0- 



a^a^\ip n (q + ) and a^a^\ip n (q ). 



(5) 



By definition, the ideal I n is toric, so it is generated by binomials. In fact, it 
is generated by homogeneous binomials, because each row of the matrix B n used 
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for defining it has row sum n + 1 (|10j. chapter 4). In addition, Sturmfels and 
Sullivant in [11] have shown that the ideal /„ is generated in degree 2. Hence 
it suffices to consider homogeneous quadratic binomials. Let q = q + — q~ be a 
binomial in /„ of degree 2. Then clearly either (Q]) or holds; that is, either 
the index corresponding to one leaf is fixed for all the monomials in q, or none 
of them are. 

In the former case, for the index i from equation ([!]), 

q e I n ip n (q + ) = ip n (q~) 

<^=> (pn-i(^i(q + )) = ^ n -i(^i(q^)) -Ki(q) e I n -1, 

where the first statement holds by definition of cp n and the second by definition 
of the projection Tr*. 

In the latter case, for each i with 1 < i < n, 

q € In <Pn(q + ) = <Pn(q~) 

where the second statement holds by definition of 7r* and f5]). It follows that 
I„ = (q:q€ Q„). 

In particular, the set of generators for /„ with Property (i) can be obtained 
from those of I n -i by inserting first at the i th index position for each monomial 
of q £ Gn-i and then repeating the same process by inserting 1. This operation 
corresponds to lifting to all the possible preimages of 71* (g) that satisfy Prop- 
erty (i) for each 1 < i < n and every q 6 Gn-i- The set of generators for /„ 
with Property (ii) can be obtained from those of I n -i by a similar lifting to all 
preimages of 7r*(g) for each q £ Q n -i in such a way that Property (ii) is satisfied. 
Namely, for every q = q + — q~ € Q n -\ with Property (ii), one inserts at the i th 
index position for one monomial of q + and for one monomial of q~ , and inserts 1 
at the i th index position for the remaining monomials of q + and q~ . In addition, 
by definition of Property (ii) , it suffices to lift to the preimages of 7r„ (q) only. □ 

Remark 6. A different recursion has been proposed by Sturmfels and Sullivant 
in [H]. 

Recall (|10j) that a binomial q = q + — q~ € I is said to be primitive if there 
exists no binomial / = / + — /~ G I with the property that f + \q + and f~\q~. 
A circuit is a primitive binomial of minimal support. 

Remark 7. The binomials in Q n are circuits of I n , since the ideal is generated 
by squarefree binomials and contains no linear forms. 

In general, we can describe the generators of /„ as follows: given n, begin 
by lifting recursively to produce Q n -i] that is, until the number of indices 
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of each generator reaches n — 1. Next, lift Q n -\ n times so that Property (i) is 
satisfied for one of the n index positions. For example, 

9 '■= 9oooo9im — 9iooi9ono G Qi 

can be lifted to a generator of 1$ in ten different ways: by lifting to preimages of 
7Ti , . . . , 7T5 so that Property (i) is satisfied with either a or a 1 : 

n i 1 ( < ?) = {9ooooo9oim — 9oiooi9oono, 9ioooo9iim — 9iiooi9iono}, 
n 2 = {900000910111 — 9ioooigooiio, 9oiooo9imi — 9iiooi9omo}, 

and so on. This will be the set of binomials in Q n with Property (i). Clearly, 
some generators will repeat during the recursive lifting: lifting by inserting at 
position (i) allows the to occur at the previous i — 1 positions. Also, fixing 1 
at any leaf allows to appear on any of the other leaves. 

To construct q + — q~ with Property (ii), we need not proceed inductively, 
as all projections of binomials that satisfy this property must satisfy it, too. 
Instead, we consider two cases corresponding to the parity of n. Namely, recalling 
the definition of Property (ii), first we fix q~ in such a way to ensure that 

in<ie*(l) = 9 + - 

Suppose n is odd. Fix q~ by taking 

9 = 9oi...i9io...o 

with n indices in each of the two variables. Then n — 1 being even provides that 
aQ^a^ 1 ^ \ip n (q~). Thus every choice of q + must satisfy the same. To find q + , 
we need to choose pairs of ri-digit binary numbers with digits complementary 
to each other, and thus there are 2™ _1 — 1 choices for q + . Specifically, listing 
the smallest - 1 n-digit binary numbers and pairing them with the largest 
2™ _1 — 1 n-digit binary numbers in reverse order produces all choices for q + , and 
we have a complete list of generators. For example, the first such generator in 
the list would be 9o...o9i...i - 9oi...i9io...o- 

If n is even, then we can create q~ such that (oq +1 ^) 2 or (a^™ +1 ' ) ) 2 divides 
Lp n {q~) and (p n (q + ). Namely, the two choices for q~ are 

9~ = 9oi...i9io...o and q~ = 9oi...io9io...oi- 

The list of all possible q + is obtained in the manner similar to the case when n 
is odd, except that the odd pairs in the list receive the first choice of q~ , while 
the even pairs receive the second. The number of such generators q + — q~ is 
2"-i _ 2 : since there are 2 n n-digit binary numbers and thus half as many pairs, 
and 2 choices are taken by the q~ . 

In summary, the number of generators of /„ that satisfy Property (ii) is 
(2™- 1 -2) + (n mod 2). 

Next we strengthen Proposition J2]). 

Proposition 3. The set Q n is a lexicographic Grobner basis of I n , for any n > 
4. 
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Proof. For the case n = 3 this is already shown. Let n > 3. Then we can parti- 
tion the set of q G Q n into those satisfying Property (i) or (ii). Note that is 
prime by definition, and thus radical. Also, Proposition ([2]) shows it is generated 
by squarefree quadratic binomials. These facts are used in what follows. 

Let qi,qj E I n . If (q^,q^) = 1, the S-pair S(qi,qj) reduces to zero. Also, if 
q~ and qj are not relatively prime, the cancellation criterion provides that the 
corresponding S-pair also reduces to zero. Therefore we consider / := S(qi, qj) e 
I n with {qf,qj) 7^ 1 and {q// ,qj) = 1. In particular, deg(/) = 3. Let us write 
qt = q gi q g2 ~ q^q^ and qj = q gi q ga - q h3 qf u - Then 

/ = ?9 3 9fei%2 - Qg2 ( lh 3 qh 4 G In- 
Case I. Suppose qi satisfies Property (i) and qj satisfies Property (ii). Then 
there exists a k such that irk(qi) G I n -x- Furthermore, Property (ii) implies that 
iTk(qj) G I n —\. A very technical argument shows that 

7Tfc(/) G I n -1 

and furthermore, this projection preserves the initial terms. In summary, to 
check that 7Tfc(/) 6 I n -i, h suffices to ensure that ai"' ) |(y5 n _i(7rfc(g g3 g/ ll gft, 2 )) if 

and only if ai n ^\(p n -i(irk(q g2 qh 3 qh i )), where s is the sum of the observations on 
the leaves of the (n — l)-leaf tree obtained from T by deleting leaf (fc). There 
are two cases corresponding to the parity of n. If n is odd, there are additional 
subcases determined by the correspondence of the images of the variables in the 
two monomials of / under (p n -i- The facts that qi and qj satisfy Properties (i) 
and (ii), respectively, play a crucial role in the argument. Checking all the cases 
then shows that 7Tfe(/) G I n -i and that initial terms are preserved under this 
projection. 

Applying the induction hypothesis then finishes the proof. 
Case II. Suppose both and qj satisfy Property (i). Then there is a qu G Q n 
satisfying Property (ii) where both S(qi,qk) and S(qj,qt) reduce to zero. The 
three- pair criterion ([8]) provides the desired result. 

Case III. If both qi and qj satisfy Property (ii), then it can be seen from the con- 
struction preceding this Proposition that the initial terms are relatively prime, 
so their S-polynomial need not be considered. □ 

Proposition [3] has important theoretical consequences. Let S be a polynomial 
ring over the field K. Recall ([4]) that S/I is Koszul if the field K has a linear 
resolution as a graded S/I- module: 

> (S/If 2 (-2) -» (S/If 1 (-1) -^S/I^K^Q. 

An ideal I C S is said to be quadratic if it is generated by quadrics. S/I is 
quadratic if its defining ideal / is quadratic, and it is G-quadratic if / has a 
quadratic Grobner basis. It is known (e.g. [1]) that if S/I is G-quadratic, then it 
is Koszul, which in turn implies it is quadratic. The reverse implications do not 
hold in general. We have just found an infinite family of toric varieties whose 
coordinate rings S/I are G-quadratic. 
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Corollary 1. The coordinate ring of the toric variety whose defining ideal is I n 
is Koszul for every n. 

The approach developed here produces the list of generators for the kernel of 
B n all of which are of degree two. In addition, by constructing the toric ideals of 
invariants inductively, we are able to explicitly calculate the quadratic Grobner 
bases. In light of the conjecture posed in [IT] that the ideal of phylogenetic 
invariants for the group of order k is generated in degree at most k, we are 
working on generalizing the above approach to any abelian group of order k. In 
particular, we want to give a description of the lattice basis ideal Il u and the 
ideal of invariants / for G = Z2 x Z2 with generators of degree at most 4. These 
phylogenetic ideals are of interest to computational biologists. 
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